-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ENH]: Plumb prefix for spann and hnsw segment #4753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 06-03-_enh_plumb_prefix_path_all_the_way_to_the_bf_writer
Are you sure you want to change the base?
[ENH]: Plumb prefix for spann and hnsw segment #4753
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Plumb Prefix Path Support Through Segment, HNSW, and Blockstore Codebase This PR introduces comprehensive propagation of path prefixes into segment and HNSW/hnswlib-related data storage and retrieval. Paths are now constructed and threaded through all I/O layers for segment-based, blockfile, and hnsw indices. This affects construction, read, write, fork, flush, prefetch, and blockfile location-ensuring distinct collection isolation, cross-cloud compatibility, and addressing future requirements for prefix-based multi-tenancy in storage. Key Changes: Affected Areas: Potential Impact: Functionality: All code paths that read, write, fork, or flush blocks/index/data now require and use a correct segment and prefix_path context; incorrect or missing prefixes will produce errors or lead to storage isolation violations. Performance: Prefetch, cache, and block location performance may see minor penalties due to added prefix computations, though most changes are in I/O glue and not hot path. Security: Stronger isolation between tenants/collections due to forced prefix scoping in storage access patterns. Scalability: Supports future multi-cloud or multi-tenant scale-out via path partitioning; failure to provide/validate prefixes may break existing deployments not migrated to prefix-aware storage. Review Focus: Testing Needed• Full test pass for rust/segment, rust/index, and rust/blockstore (including proptests). Code Quality Assessmentrust/segment/src/blockfile_record.rs: Mostly safe refactoring and threading. Large function parameter lists now take prefix_path from segment. Error enums and match arms are updated properly. rust/index/src/spann/types.rs: Careful propagation of prefix_path, additional arguments in multi-argument methods, explicit validation of consistency. rust/blockstore/src/arrow/blockfile.rs: All block get/fork/load/flush paths updated for new path, tests adapted. Clean use of new BlockfileReaderOptions for prefix clarity. rust/garbage_collector/src/operators/compute_unused_files.rs: Careful path parsing and key generation per new logic. Adequately tested for edge cases. error-handling: More explicit error types (~InvalidPrefixPath) and guard rails to prevent inconsistent state; generally improved error reporting. Best PracticesModularization: Type Safety: Test Coverage: Potential Issues• If upstream code fails to call new functions with the right prefix_path, data corruption or cross-collection leaks are possible. This summary was automatically generated by @propel-code-bot |
573315d
to
c24e5b0
Compare
7c0cc19
to
46411af
Compare
c24e5b0
to
a220d89
Compare
46411af
to
6c82dd4
Compare
6c82dd4
to
d29be4b
Compare
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?